##1. Kokybės kontrolė : metilinamo tankio funkcijos tarp CpG salų elementų
Kiekvienai pozicinei eilutei paskaičiuojamas tos eilutės mėginių vidurkis.
anno <- readRDS("../output/annotation.rds")
beta <- readRDS("../output/betab.rds")
data <- read.csv("../output/correctedSample.csv")
n<- length(rownames(beta))
#skaičiuoja vidurkius
means <- matrix(ncol=2, nrow=n)
for(i in 1:n){
means[i,] <- c((rownames(beta)[i]), mean(beta[i,]))
}
colnames(means) <- c("name", "mean")
head(as.data.frame(means))
## name mean
## 1 cg01707559 0.171268794480463
## 2 cg02004872 0.117750665305373
## 3 cg02494853 0.0302013463252889
## 4 cg03244189 0.198499881618238
## 5 cg03706273 0.0299952362271422
## 6 cg04023335 0.34280657826654
#Kiekvienai salos grupei skaičiuoajamas CpG tankis.
Šiuo atveju yra 6 salos.
## Loading required package: S4Vectors
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind, colnames,
## dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
## grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
## union, unique, unsplit, which, which.max, which.min
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
##
## expand.grid
## [1] "Island" "S_Shore" "OpenSea" "N_Shore" "N_Shelf" "S_Shelf"
##
## Call:
## density.default(x = relate)
##
## Data: Island (27700605 obs.); Bandwidth 'bw' = 0.004328
##
## x y
## Min. :-0.01296 Min. : 0.00001
## 1st Qu.: 0.24352 1st Qu.: 0.22143
## Median : 0.50000 Median : 0.31227
## Mean : 0.50000 Mean : 0.97362
## 3rd Qu.: 0.75647 3rd Qu.: 0.62755
## Max. : 1.01295 Max. :10.62007
##
## Call:
## density.default(x = relate)
##
## Data: S_Shore (8972500 obs.); Bandwidth 'bw' = 0.01231
##
## x y
## Min. :-0.0368 Min. :0.000015
## 1st Qu.: 0.2316 1st Qu.:0.543589
## Median : 0.5000 Median :0.631717
## Mean : 0.5000 Mean :0.930500
## 3rd Qu.: 0.7684 3rd Qu.:1.103510
## Max. : 1.0368 Max. :3.960065
##
## Call:
## density.default(x = relate)
##
## Data: OpenSea (30277470 obs.); Bandwidth 'bw' = 0.00602
##
## x y
## Min. :-0.01803 Min. :0.000001
## 1st Qu.: 0.24099 1st Qu.:0.305274
## Median : 0.50000 Median :0.513307
## Mean : 0.50000 Mean :0.964245
## 3rd Qu.: 0.75902 3rd Qu.:1.031196
## Max. : 1.01803 Max. :4.908815
##
## Call:
## density.default(x = relate)
##
## Data: N_Shore (11468520 obs.); Bandwidth 'bw' = 0.01167
##
## x y
## Min. :-0.03477 Min. :0.000013
## 1st Qu.: 0.23265 1st Qu.:0.553297
## Median : 0.50007 Median :0.641694
## Mean : 0.50007 Mean :0.933947
## 3rd Qu.: 0.76749 3rd Qu.:1.115567
## Max. : 1.03491 Max. :3.712139
##
## Call:
## density.default(x = relate)
##
## Data: N_Shelf (4359155 obs.); Bandwidth 'bw' = 0.007042
##
## x y
## Min. :-0.01979 Min. :0.000001
## 1st Qu.: 0.24043 1st Qu.:0.230926
## Median : 0.50065 Median :0.397615
## Mean : 0.50065 Mean :0.959798
## 3rd Qu.: 0.76087 3rd Qu.:1.071314
## Max. : 1.02109 Max. :5.282310
##
## Call:
## density.default(x = relate)
##
## Data: S_Shelf (3895545 obs.); Bandwidth 'bw' = 0.006787
##
## x y
## Min. :-0.01769 Min. :0.000001
## 1st Qu.: 0.24179 1st Qu.:0.193128
## Median : 0.50127 Median :0.384813
## Mean : 0.50127 Mean :0.962524
## 3rd Qu.: 0.76074 3rd Qu.:1.083690
## Max. : 1.02022 Max. :5.445911
X ašis grafike rodo tikimybę gauti x reikšmę.
colors <- c("red","yellow", "green", "gold", "lightblue" ,"blue" )
plot(island1, col = colors[1], main = "Relation_to_Islands", xlab = NA)
lines(island2, col = colors[2])
lines(island3, col = colors[3])
lines(island4, col = colors[4])
lines(island5, col = colors[5])
lines(island6, col = colors[6])
legend("top",
c(islands),
fill=c(colors)
)
Plot (be mėginių vardų) pavaizduoja atstumų dydžius tarp mėginių.
Mėginiai užima labai daug vietos, todėl atvaizdavimas su mėginių pavadinimais nėra prasmingas,
nes medis tiesiog nesimato.
hc<-hclust(as.dist(1-cor(beta)), method = "complete")
plot(hc, xlab="Samples", main="Dissimilarity = 1 - Correlation", labels= FALSE)
## Loading required package: dynamicTreeCut
## Loading required package: fastcluster
##
## Attaching package: 'fastcluster'
## The following object is masked from 'package:stats':
##
## hclust
##
##
## Attaching package: 'WGCNA'
## The following object is masked from 'package:S4Vectors':
##
## cor
## The following object is masked from 'package:stats':
##
## cor
#Dendograma mėginiams pavaizduoti.
Mėginių pavadinimai atitinkamai paversti į spalvas.
Mėginiai x ašyje yra sutampančios grupės. Jos susidarė dėl panašaus atstumo koreliuojant.
Y ašyje matomi panašūs atstumai - jie yra maži. Positions Nurodo pozicijas pagal spalvą( toliau yra legenda, kurioje nurodytos spalvų reikšmės). Taip pat parodyta lytis ir tiriamieji - sveikas asmuo ar sergantis.
head(data)
## X sample_name slide type source_name_ch1 organism_ch1
## 1 18 DLPFC_Alzheimer_Br5078 GSM3584161 genomic DLPFC Homo sapiens
## 2 19 DLPFC_Control_Br1987 GSM3584162 genomic DLPFC Homo sapiens
## 3 20 DLPFC_Control_Br1764 GSM3584163 genomic DLPFC Homo sapiens
## 4 23 DLPFC_Alzheimer_Br5107 GSM3584166 genomic DLPFC Homo sapiens
## 5 24 DLPFC_Alzheimer_Br5105 GSM3584167 genomic DLPFC Homo sapiens
## 6 25 DLPFC_Control_Br1157 GSM3584168 genomic DLPFC Homo sapiens
## sample_plate Array brNum tissue tissue_region sample age sex race
## 1 Plate 1 R05C02 Br5078 brain DLPFC Alzheimer 64 M CAUC
## 2 Plate 1 R01C02 Br1987 brain DLPFC Control 54 M CAUC
## 3 Plate 1 R03C01 Br1764 brain DLPFC Control 56 M CAUC
## 4 Plate 1 R05C02 Br5107 brain DLPFC Alzheimer 80 M CAUC
## 5 Plate 1 R01C02 Br5105 brain DLPFC Alzheimer 76 M CAUC
## 6 Plate 1 R03C01 Br1157 brain DLPFC Control 60 M AA
## negcontrol_pc1 negcontrol_pc2 neun_pos snppc1 molecule_ch1 label_ch1
## 1 5.649698 -3.8772291 0.2502178 -0.0553337 genomic DNA Cy3/Cy5
## 2 -8.078151 -1.1767034 0.2337783 -0.0491986 genomic DNA Cy3/Cy5
## 3 -14.318467 -1.1313009 0.2161111 -0.0489529 genomic DNA Cy3/Cy5
## 4 4.431130 -4.2548791 0.2754533 -0.0523536 genomic DNA Cy3/Cy5
## 5 -7.481998 9.2532100 0.3023755 -0.0556155 genomic DNA Cy3/Cy5
## 6 5.891752 -0.3727747 0.1653021 0.0908303 genomic DNA Cy3/Cy5
## taxid_ch1
## 1 9606
## 2 9606
## 3 9606
## 4 9606
## 5 9606
## 6 9606
## description
## 1 Postmortem human brain tissue samples from four brain regions in an Alzheimers disease case-control series used to generate Human450k DNA methylation data.
## 2 Postmortem human brain tissue samples from four brain regions in an Alzheimers disease case-control series used to generate Human450k DNA methylation data.
## 3 Postmortem human brain tissue samples from four brain regions in an Alzheimers disease case-control series used to generate Human450k DNA methylation data.
## 4 Postmortem human brain tissue samples from four brain regions in an Alzheimers disease case-control series used to generate Human450k DNA methylation data.
## 5 Postmortem human brain tissue samples from four brain regions in an Alzheimers disease case-control series used to generate Human450k DNA methylation data.
## 6 Postmortem human brain tissue samples from four brain regions in an Alzheimers disease case-control series used to generate Human450k DNA methylation data.
## description.1 platform_id
## 1 1-C02 GPL13534
## 2 1-C03 GPL13534
## 3 1-C04 GPL13534
## 4 1-C08 GPL13534
## 5 1-C09 GPL13534
## 6 1-C10 GPL13534
## Basename
## 1 C:/Users/asta1/Desktop/stud/6sem/biomeds/2darbas/input/GSE125895/supplement/GSE125895_idat/GSM3584161_9283265070_R05C02
## 2 C:/Users/asta1/Desktop/stud/6sem/biomeds/2darbas/input/GSE125895/supplement/GSE125895_idat/GSM3584162_9283265146_R01C02
## 3 C:/Users/asta1/Desktop/stud/6sem/biomeds/2darbas/input/GSE125895/supplement/GSE125895_idat/GSM3584163_9283265163_R03C01
## 4 C:/Users/asta1/Desktop/stud/6sem/biomeds/2darbas/input/GSE125895/supplement/GSE125895_idat/GSM3584166_9297949021_R05C02
## 5 C:/Users/asta1/Desktop/stud/6sem/biomeds/2darbas/input/GSE125895/supplement/GSE125895_idat/GSM3584167_9297949030_R01C02
## 6 C:/Users/asta1/Desktop/stud/6sem/biomeds/2darbas/input/GSE125895/supplement/GSE125895_idat/GSM3584168_9297949038_R03C01
## sentrix_id sample_id
## 1 9283265070 GSM3584161_9283265070_R05C02
## 2 9283265146 GSM3584162_9283265146_R01C02
## 3 9283265163 GSM3584163_9283265163_R03C01
## 4 9297949021 GSM3584166_9297949021_R05C02
## 5 9297949030 GSM3584167_9297949030_R01C02
## 6 9297949038 GSM3584168_9297949038_R03C01
unique(data$source_name_ch1)
## [1] DLPFC ERC HIPPO
## Levels: DLPFC ERC HIPPO
data$loc_to_num <- data$source_name_ch1
data$loc_to_num <- gsub("DLPFC", 1, data$loc_to_num)
data$loc_to_num <- gsub("ERC", 2, data$loc_to_num)
data$loc_to_num <- gsub("HIPPO", 3, data$loc_to_num)
data$loc_to_num <- as.integer(data$loc_to_num)
plotDendroAndColors(hc, labels2colors(data[,c("source_name_ch1", "sex", "sample")]),
main = "Sample dendrogram and trait heatmap",
dendroLabels=data$loc_to_num,
addGuide = TRUE,
groupLabels = c("Position", "Gender", "Sample")
)
cols <- data.frame(labels2colors(unique(data[,c("source_name_ch1", "sex", "sample")])), unique(data[,c("source_name_ch1", "sex", "sample")]))
col1 <- c(unique(as.character(cols$X1)), unique(as.character(cols$X2)), unique(as.character(cols$X3)))
col2 <- c(unique(as.character(cols$source_name_ch1)), unique(as.character(cols$sex)), unique(as.character(cols$sample)))
plot(NULL,xaxt='n',yaxt='n',bty='n',ylab='',xlab='', xlim=0:1, ylim=0:1)
legend("topleft", legend = col2,
fill = col1)
#install.packages("BiocManager")
#BiocManager::install("WGCNA")
Surandami variabiliausi duomenys t.y. surikiuojami ir naudojami pirmieji 5000 pozicijų.
Heatmap pavaizduoja hierarchinį klasterizavimą, reikšmės pakeistos spalvomis. Raudona reiškia mažą reikšmę. Šviesesnė spalva reiškia didesnę.
rowvariances <- apply(beta,1,var)
orderedVarianceIndexes <- order(rowvariances,decreasing = TRUE)
heatmap(beta[orderedVarianceIndexes[1:5000], ], Colv = NA,
xlab="Location", ylab="Sentrix Id",
labRow = data$sentrix_id,
labCol = data$source_name_ch1,
main="heatmap")
Iš grfiko, galime matyti, kad daugiausia variacijų sudaro pirmos 3 komponentės. Visos kitos komponentės turi mažai variacijų.
components <- prcomp(t(beta))
screeplot(components, main = "Components")
Poruojant komponentes, kai kurios poros turi panašumų pvz (PC4 ir PC5), kai mažėja PC4, mažės ir PC5.
components$x[1:5,1:2]
## PC1 PC2
## GSM3584161_9283265070_R05C02 -4.901724 18.2607528
## GSM3584162_9283265146_R01C02 -10.870001 0.2751873
## GSM3584163_9283265163_R03C01 -12.042775 -5.2408401
## GSM3584166_9297949021_R05C02 -11.140049 3.3627905
## GSM3584167_9297949030_R01C02 -9.025372 13.4253857
pairs(components$x[,1:5], col = labels2colors(data$source_name_ch1), main = "PCA-plot", pch = data$loc_to_num)
Raudona reiškia dideles reikšmes, geltona - mažas. Likusios komponentės yra panašesnės. Ne tokie dideli skirtumai kaip pirmoje komponentėje. Šitaip vaizduojant duomenų matricą, galima greičiau rasti kintamuosius. Šiuo atveju komponenčių ir mėginių pasiskirstymą.
heatmap(components$x[,1:5], Colv = NA,
labRow = data$sample)
#heatmap(components$rotation)
## Loading required package: plotly
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:S4Vectors':
##
## rename
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
Interaktyvus heatmap, kuris nurodo mėginių ir komponenčių reikšmes.
x<-colnames(components$x)
y<- rownames(components$x)
p <- plot_ly(x=x, y=y,
z = components$x,
type = "heatmap",
colorscale= "blues"
) %>%
layout(xaxis = list(title="Components"), yaxis = list(title="Samples"))
p